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Abstract 

The aim of this work is studying the use of copulas and vines in the optimization 
with Estimation of Distribution Algorithms (ED As). Two ED As are built around the 
multivariate product and normal copulas, and other two are based on pair-copula decom- 
position of vine models. Empirically we study the effect of both marginal distributions 
and dependence structure separately, and show that both aspects play a crucial role in the 
success of the optimization. The results show that the use of copulas and vines opens new 
opportunities to a more appropriate modeling of search distributions in EDAs. 



1 Introduction 



Estimation of distribution algorithms (EDAs) |31[ 33 are stochastic optimization methods char- 
acterized by the expHcit use of probabiHstic models. EDAs explore the search space by sampling 
a probability distribution (search distribution) previously built from promising solutions. 

Most existing continuous EDAs are based on either the multivariate normal distribution 
or models derived from it |9] |28| . However, in situations where empirical evidence reveals 
significant departures from the normality assumption, these EDAs construct incorrect models 
of the search space. A solution come with the copula function [34], which provides a way to 
separate the statistical properties of each variable from the dependence structure: first, the 
marginal distributions are fitted using a rich variety of univariate models available, and then, 
the dependence between the variables is modeled using a copula. However, the multivariate 
copula approach has limitations. The number of multivariate copulas is rather limited, and 
usually these copulas have only one parameter to describe the overall dependence. Thus, this 
approach is not appropriate when all the pairs of variables do not have the same type or 
strength of dependence. For instance, the t-copula uses one correlation coefficient per each 
pair of variables, but has only a single degree of freedom parameter to characterize the tail 
dependence for all pairs. 



An alternative approach to this problem is the pair-copula construction method (PCC) 



[6j [7 26 , which allows to built multivariate distributions using only bivariate copulas. PCC 
models of multivariate distributions are represented in a graphical way as a sequence of nested 
trees, which are called vines. These graphical models provide a powerful and flexible tool to deal 
with complex dependences as far as the pair-copulas in the decomposition can be of different 
copula families. 

In recent years, several copula-based EDAs have been proposed in the literature. The 
authors have studied the behavior of these algorithms in test functions [l2 17, 21 22 39 44 



45 48 and a real-world problem 46 . Indeed, the use of copulas has been identified as one of 
the emerging trends in the optimization of real- valued problems using EDAs [25]. In this work, 
various models based on copula theory are combined in an EDA: two models are built using the 
multivariate product and normal copulas and other two are based on two PCC models called 
C-vine and D-vine. We empirically evaluate the performance of these algorithms on a set of 
test functions and show that vine-based EDAs are better endowed to deal with problems with 
different dependences between pair of variables. 

The paper is organized as follows. Section [2] introduces the notion of copula and describes 
two EDAs based on the multivariate product and normal copulas, respectively. Section [3] 
presents the notion and terminology of vines and introduces two EDAs based on C-vine and 
D-vine models, respectively. Section |4] reports and discuses the empirical investigation. Finally, 
Section [5] gives the conclusions. 



2 Two Continuous EDAs Based on Multivariate Copulas 



We start with some definitions from copula theory 27 34 . Consider n random variables 
X = (Xi, . . . , Xn) with joint cumulative distribution function F and joint density function 
/. Let X = (xi, . . . ,Xn) be an observation of X. A copula C is a multivariate distribution 



with uniformly distributed marginals (0, 1) on [0,1]. Sklar's theorem 43 states that every 
multivariate distribution F with marginals Fi, F2, . . . , Fn can be written as 



Fixi 



,x.„)^C{F ixi),...,Fixn)) 



and 



C {u^, un) - F (f[-^^ (ui) , . . . , K)) 



where F^ are the inverse distribution functions of the marginals. If F is continuous then 
C . . . , Un) is unique. The notion of copulas separates the effect of dependence and margins 
in a joint distribution 
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The copula C provides all information about the dependence 
structure of F, independently of the specification of the marginal distributions. 

An immediate consequence of Sklar's theorem is that random variables are independent if 
and only if their underlying copula is the independence or product copula Ci, which is given by 



Cl (Ui, . . . ,W„) = Wi Un- (1) 

The UMDA proposed in |31| assumes a model of independence of normal margins. There- 
fore, an EDA based on the product copula is a generalization of the UMDA, which also supports 
other types of marginal distributions. 

Besides UMDA, in fsT the authors also proposed an EDA based on the multivariate normal 
distribution. They called it Estimation of the Multivariate Normal Algorithm (EMNA). It 
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turns out that, indeed EMNA can be also reformulated in copula terms: a normal copula plus 
normal margins. 

The Gaussian Copula Estimation of Distribution Algorithm (GCEDA) proposed in [3 45 
uses the multivariate normal (or Gaussian) copula, which is given by 

C^iu^,..., u„; R) = $fl ($-1 (ui) , . . . , K)) , (2) 

where $fl is the standard multivariate normal distribution with correlation matrix R, and 
denotes the inverse of the standard univariate normal distribution. This copula allows the 
construction of multivariate distributions with non-normal margins. If this is the case, the 
joint density is no longer the multivariate normal, though the normal dependence structure is 
preserved. Therefore, with normal margins, GCEDA is equal to EMNA, otherwise they are 
different. 

If the marginal distributions are non-normal, the correlation matrix is estimated using the 
inversion of the non-parametric estimator Kendall's tau Rij = sin ('^/2fij) for each pair of 



variables i, j = 1, . . . ,n 34 . If the resulting matrix R is not positive-definite, the correction 
proposed in can be applied. 

In this work, all margins used by the algorithms are always of the same type, either normal 
(Gaussian) or empirical smoothed with a normal kernel. In particular, the estimation of the 
normal margin Fi ^ M {xi \ [li,af^ requires the computation of the mean fli and variance 
(Ji^ from the selected population. The empirical margin is estimated using the normal kernel 
estimator given by 

N 



N ^ \ h 

where the set {j/i, . . . ,1/^} is the sample of the i*'^ variable of X in the selected population with 
N individuals. The bandwidth parameter h is computed according to the rule-of-thumb of |42] . 
In this paper, the subscripts g and e in the name of the algorithms denote the use of Gaussian 
and empirical margins, respectively (e.g., UMDAg and GCEDAo). 

The generation of a new individual in GCEDAg and GCEDAe starts with the simulation 



of a vector (ui,...,w„) from the multivariate normal copula 16 . In GCEDAg, the inverse 
distribution function Xi = [ui] fii,af) is used to obtain each Xi of the new individual. In 
GCEDAe, Xi is found by solving the inverse of the marginal cumulative distribution using the 
Newton-Raphson method [4]. 



3 ED As Based on Vines 

This section provides a brief description of the C-vine and D-vine models and the motivation 
for using them to construct the search distributions in ED As. We also introduce CVEDA and 
DVEDA, our third and fourth algorithms. 



3.1 From Multivariate Copulas to Vines 

The multivariate copula approach has several limitations. Most of the available parametric 
copulas are bivariate and the multivariate extensions usually describe the overall dependence by 
means of only one parameter. This approach is not appropriate when there are pairs of variables 
with different type or strength of dependence. The pair-copula construction method (PCC) is 
an alternative approach to this problem. PCC method was originally proposed in |26| and 
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Figure 1: Four-dimensional C-vine (a) and D-vine (b). In a C-vine, each tree Tj has a unique 
node with n — j edges. The node with n — 1 edges in tree is called the root. In a D-vine, no 
node is connected to more than two edges. 



this result was later developed in |6| |7 26 . The decomposition of a multivariate distribution 
in pair-copulas is a general and flexible method for constructing multivariate distributions. 
In PCC models, bivariate copulas are used as building blocks. The graphical representation 
of these constructions involves a sequence of nested trees, called regular vines. Pair-copula 
constructions of regular vines allows to model a rich variety of types of dependences as far as 
the bivariate copulas can belong to different families. 



3.2 Pair-Copula Constructions of C-vines and D-vines 

Vines are dependence models of a multivariate distribution function based on a decomposition 
of f (xi, . . . ,Xn) into bivariate copulas and marginal densities. A vine on n variables is a 
nested set of trees Ti, . . . , r„_i, where the edges of tree j are the nodes of the tree j + 1 with 
j — l,...,n — 2. Regular vines constitute a special case of vines in which two edges in tree j 
are joined by an edge in tree j -I- 1 only if these edges share a common node. 

Two instances of regular vines are the canonical (C) and drawable (D) vines. In Figure [l] 
a graphical representation of a C-vine and D-vine for four dimensions is given. Each graphical 
model gives a specific way of decomposing the density. In particular, for a C-vine, / (xi, . . . , a;„) 
is given by 

n n— 1 n—j 

n (^fe) n n ^j,j+^i,-„j-i i^ii^i^ ■ • • > ,f {xj+^\xi, . . .,xj_i)) , (s) 

k=l j = l i=l 

and for a D-vine, the density is equal to 

n Ti— 1 n~j 

n / (-^fc) n n Ci,j+j|j+l,....j+j-l {F {x^lXi+i,. . . , X^+j-i) , F {x^+J\x^+l,. . . , X^+j.i)) , (4) 

fc=l j=l 1=1 

where j identifies the trees and i denotes the edges in each tree. 

Note that in (|3| and Q the joint density consists of marginal densities / (xk) and pair- 
copula densities evaluated at conditional distribution functions of the form F (x \ v). 

In |26| it is showed that conditional distribution of pair-copulas constructions are given by 

dC.^^^^^_^iF{x\v^,),F{v,\v^,)) 
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where C^y.\^_. is a bivariate copula distribution function, v is a n-dimensional vector, Vj is 
the j components of v and v_j denotes the remaining component. The recursive evaluation of 
F {x\'v) yields the expression 

^("'^^ = mM ■ 

For the special case (unconditional) when v is univariate, and x and v are standard uniform, 
F {x\ v) reduces further to 

F{x\v)^ Wv • 

where is the set of parameters for the bivariate copula of the joint distribution function of x 
and V. To facilitate de computation of (a; | v), the function 

(6) 

ov 

is defined. The inverse of h with respect to the first variable h^"^ is also defined. The expressions 
of these functions of the bivariate copulas used in this work are given in Appendix [X] 



3.3 Vine Estimation of Distribution Algorithms 

Vine Estimation of Distribution Algorithms (VEDAs) [21 44 are a class of EDAs that uses 
vines to model the search distributions. CVEDA and DVEDA are VEDAs based on C- vines 
and D- vines, respectively. Now we describe the particularities of the estimation and simulation 
steps of these algorithms. 



3.3.1 Estimation 

The estimation procedures of C- vines and D- vines proposed and developed in [l] consist of the 
following main steps: selection of a specific factorization, choice of the pair-copula types in the 
factorization, and estimation of the copula parameters. Next we describe these steps according 
to our implementation. 

1. Selection of a specific factorization: 

The selection of a specific pair-copula decomposition implies to choose an appropriate 
order of the variables, which can be obtained by several ways: given as parameter, selected 
at random, chosen by greedy heuristics. We use greedy heuristics for detecting the most 
important bivariate dependences. 

Assumed a specific factorization, the first step of the estimation procedure consist in 
assigning weights to the edges. As weight we use the absolute value of the empirical 
Kendall's tau between pair of variables |34 . The next step consist in determining the 
appropriate order of the variables of the decomposition, which depend on the type of 
pair-copula decomposition: 

• In a C-vine, the tree that maximizes the sum of the weights of one node (the root) 
to the others is chosen by the greedy heuristic as the appropriate factorization. 
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• In a D-vine, the first tree is that wliich maximizes the weighted sequence of the 
original variables. In |10| , this problem is transformed into a traveling salesman 
problem (TSP) instance by adding a dummy node with weight zero on all edges to the 
other nodes. For efficiency, we use the cheapest insertion heuristic, an approximate 
solution of TSP presented in [37]. In a D-vine, the structure of remaining trees is 
completely determined by the structure of the first. 

A pair-copula decomposition has n—1 trees and requires to fit "("-i)/2 copulas. Assuming 
conditional independence might simplify the estimation step, since if X and Y are con- 
ditionally independent given V, then c^yi^ {Fx\v (2; | v) , Fy^^ {y \ v)) = 1. This property 
is used by a model selection procedure proposed in [l^ , which consists in truncating the 
pair-copula decomposition at specific tree level, fitting the product copula in the subse- 
quent trees. For detecting the truncation tree level, this procedure uses either the Akaike 
Information Criterion (AIC) j2] or the Bayesian Information Criterion (BIC) |4l], such 
that the tree is expanded if the value of the information criteria calculated up to 
the tree Tj+i is smaller than the value obtained up to the previous tree. Otherwise, the 
vine is truncated at tree level Tj. 

2. Choice of the pair-copula types in the factorization and estimation of the copula param- 
eters. 

(a) Determine which pair-copula types to use in tree 1 using the original data by applying 
a goodness of fit test. 

(b) Compute observations (i.e. conditional distribution functions) using the copula pa- 
rameters from tree 1 and the h{.) function. 

(c) Determine the pair-copula types to use in tree 2 in the same way as in tree 1 using 
the observations from (b). 

(d) Repeat (b) and (c) for the following trees. 

Selection of pair-copulas is accomplished in different ways |20| . In this work, the Cramer- 
von Mises statistics 

N 

Sn = '^{CE{utVi) - C&{ui,Vi)f (7) 

i=l 

is minimized. N is the sample size, is the set of parameters of a bivariate copula C©, 
and Ce is the empirical copula. We first test the product copula flOl . If there is enough 
evidence against the null hypothesis of independence (at a fixed significance level of 0.1) 
it is rejected. If this is the case, the copula C© that minimizes Sn is chosen. 

We combine different types of bivariate copulas: normal. Student's t, Clayton, rotated 
Clayton, Gumbel and rotated Gumbel. The normal copula is neither lower nor upper 
tail dependent while the Student's t copula is both lower and upper tail dependent. The 
Clayton and rotated Clayton copulas are lower tail dependent while the Gumbel and 
rotated Gumbel copulas are upper tail dependent. 

The parameters of all these copulas, but the Student's t, are estimated using the inversion 
of Kendall's tau pl^. The correlation coefficient for the Student's and normal copulas are 
computed similarly. The degrees of freedom of the Student's t copula are estimated by 
maximum likelihood with the correlation parameter held fixed [13^ . We consider an upper 
bound of 30 for the degrees of freedom because for this value the bivariate Student's t 
copula becomes almost indistinguishable from the bivariate normal copula [15]. 
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3.3.2 Simulation 



Simulation from vines [s] [6j [30] is based on the conditional distribution method described 



m 



14 . The general algorithm for sampling n dependent uniform [0, 1] variables is common for 
C-vines and D-vines. First, sample n independent uniform random numbers Wi £ (0, 1) and 
then compute 

Xi = Wi 

X2 = F~l{w2\xi) 

^3 = ■F'3[i,2 (^3|a;i,a:2) 



F 



^|l,2,....n — 1 ■ ■ ■ 7 l) • 



To determine F [xj | xi, a;2, . . . , a^j-i) for each j, the expressions dSj and dGl are used for 
both structures, although the choice of the Vj in ([s]) is different (see (|3j and For details 

about the sampling algorithms, see 

4 Empirical Investigation 

This section outlines the experimental setup and presents the numerical results. The experi- 
ments aim to show that both aspects, the marginal distributions and the dependence structure, 
are crucial for EDA optimization. 

For the empirical study we use the statistical environment R |36 and the tools provided by 



the packages copulaedas |23j and vines 24 



4.1 Experimental Design 

The well known Sphere, Griewank, Ackley and Summation Cancellation test functions ^ are 
considered as benchmark problems in n = 10 dimensions. The definition of these functions for 
X = . . . , Xn) is given below: 



/sphere (a;) = X! 



/Griewank(a;) = 1 + E ^ " 11 ( ^ j 

2—1 i—1 ^ 



i=l 

2 " / 

X^ T— r / Xi 

cos 



/Ackley (a;) = -20exp -0.2 



\ 



^ E ~ ^^P ^~ E '^"^ (27r.'Ei)^ + 20 + exp (1) 



/^Summation Cancellation (■^) ^ c , \-^n i i ; Ul — -^l; Ui — Ui—l ~l~ 

10 '^ + T,i=im 

Sphere, Griewank and Ackley are minimization problems that have global optimum at 
a; = (0, . . . , 0) with evaluation zero. Summation Cancellation is a maximization problem that 
has global optimum at a; = (0, . . . , 0) with evaluation 10^. 

To ensure a fair comparison between the algorithms, we find the minimum population size 
required by each algorithm to reach the global optimum of the function in 30 of 30 independent 
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Table 1: Results of UMDA and GCEDA in Sphere. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 










600, 600J, I — 1, . . . , 10 








UMDAg 


30/30 


86 


3,996.1±89.5 


6.9E- 


07± 1.9E - 


07 


UMDAe 


30/30 


82 


5, 466.6 ± 164.4 


7.0E- 


07± 1.7E - 


07 


GCEDAg 


30/30 


325 


13,769.1±248.5 


6.6E- 


07± 1.6E - 


07 






259 


14, 581.7 ±403.2 


7.1E- 


07±2.0E- 


07 






X, G [- 


300,900], i = 1,...,10 








UMDAg 


30/30 


118 


5,502.7±125.8 


6.4E- 


07± 1.9E - 


07 


UMDAe 


30/30 


83 


5, 513.9 ± 180.6 


7.4E- 


07± 1.9E - 


07 


GCEDAg 


24/30 


2000 


171, 666.6 ± 166,976.4 


3.0E + 


01 ±8.4E±01 


GCEDAe 


30/30 


522 


29,023.2±541.4 


7.2E- 


07±2.3E- 


07 




Table 2: 


Results of UMDA and GCEDA in 


Griewank. 




Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 








X, e [- 


600,600], i = 1,...,10 








UMDAg 


30/30 


113 


5,179.1±210.0 


7.2E - 


07± 1.7E- 


07 


UMDAe 


30/30 


475 


27, 961.6 ± 1,387.5 


7.0E- 


07± 1.8E- 


07 


GCEDAg 


30/30 


304 


12,798.4±351.1 


6.6E- 


07± 1.7E- 


07 


GCEDAe 


30/30 


324 


17, 895.6 ±536.0 


6.7E- 


07± 1.7E- 


07 






X, e [- 


300,900], j = 1,...,10 








UMDAg 


30/30 


110 


5,261.6±284.6 


6.7E- 


07±2.1E- 


07 


UMDAe 


30/30 


449 


26, 580.8 ± 1,003.3 


7.3E- 


07± 1.7E- 


07 


GCEDAg 


22/30 


2000 


201, 333.3 ± 183,220.5 


1.3E- 


01±2.5E- 


01 


GCEDAe 


30/30 


588 


32,438.0±860.9 


8.0E- 


07± 1.5E- 


07 



runs. This critical population size is determined using a bisection method |35 . The algorithm 
stops when either the global optimum is found with a precision of 10~^ or after 500, 000 function 
evaluations. A truncation selection of 0.3 is used [32], and no elitism. 

In the initial population, each variable is sampled uniformly in a given real interval. We 
say an interval is symmetric if the value that Xi takes in the global optimum of the function is 
located in the middle of the given interval. Otherwise, we call it asymmetric. The symmetric 
intervals used in the experiments are: [—600, 600] in Sphere and Griewank, [—30, 30] in Ackley, 
and [—0.16, 0.16] in Summation Cancellation. The asymmetric intervals are: [—300,900] in 
Sphere and Griewank, [—15,45] in Ackley, and [—0.08, 0.24] in Summation Cancellation. 

4.2 Effect of the Marginal Distributions 

In this section we investigate the effect of the marginal distributions under two assumptions: 
independence and joint normal dependence. The results obtained with UMDA and GCEDA in 
symmetric and asymmetric intervals are given in Tables [l}[4] We summarize the results in the 
following four points. 

1. As the asymmetry of the interval grows the performance of all the algorithms deteriorate. 
This effect is larger with normal margins. 
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Table 3: Results of UMDA and GCEDA in Ackley. 



Algorithm 


Success 


Population Evaluations 


Best Evaluation 








[-30,30], ii = 1,...,10 




UMDAg 


30/30 


88 


5,426.6±127.2 


8.2E-07±1.0E-07 


UMDAe 


30/30 


94 


8, 024.4 ± 210.1 


8.6E-07±8.4E-08 


GCEDAg 


30/30 


325 


18, 178.3±207.8 


8.0E-07±1.5E-07 


GCEDAe 


30/30 


303 


21, 866.5 ±338.3 


8.1E-07±1.4E-07 






X, G 


[-15,45], ii = 1,...,10 




UMDAg 


30/30 


95 


5,959.6±111.3 


7.7E-07±l.lE-07 


UMDAe 


30/30 


91 


7, 995.8 ± 183.1 


8.3E-07±l.lE-07 


GCEDAg 


30/30 


782 


45, 460.2 ±532.8 


8.0E-07±1.2E-07 


GCEDAe 


30/30 


357 


26,013.4±493.7 


8.5E-07±8.2E-08 


Table 4: 


Results of UMDA and GCEDA in Summation Cancellation. 


Algorithm 


Success 


Population Evaluations 


Best Evaluation 






X, G [- 


0,16, 0,16], i = 1,...,10 




UMDAg 


0/30 


2000 


500, 000.0 ±0,0 


6.9E + 02±5.0E±02 


UMDAe 


0/30 


2000 


500, 000.0 ±0,0 


1.0E + 03± 1.2E±03 


GCEDAg 


30/30 


325 


38,848.3±327,6 


1.0E + 05±1.2E-07 


GCEDAe 


30/30 


1525 


213, 144.1 ± 1,907.3 


1.0E + 05±1.0E-07 






X, G [- 


0,08, 0,24], i = 1,...,10 




UMDAg 


0/30 


2000 


500, 000.0 ±0.0 


5.6E + 02±3.8E±02 


UMDAe 


0/30 


2000 


500, 000.0 ±0.0 


1.9E + 03±1.9E±03 


GCEDAg 


4/30 


2000 


467, 000.0 ± 85, 577.5 


1.3E + 04 ± 3.4E ± 04 


GCEDAe 


30/30 


1525 


215, 330.0±1, 621.8 


1.0E + 05±l.lE-07 



We illustrate this point through the analysis of the UMDA behavior. With symmetric 
intervals, UMDAg outperforms UMDAg, which is particularly notable in the Griewank 
function. As example. Figure [2] illustrates that the variance of the normal margin shrinks 
faster than the variance of the normal kernel margin. The larger variance of the empirical 
margin can be explained by the existence of global and local optima, all of which are 
captured by the normal kernel margins. Figure [sj- (left) shows several peaks located near 
the values that the variable takes in the global and local optima, while in Figure [Sj- (right) 
the peak of the normal density lies in the middle of the interval regardless of the shape 
of the data. For this same reason, with symmetric interval, the algorithms behave better 
with normal margins than with empirical. 

2. With asymmetric intervals, GCEDA with normal kernel margins is much better than with 
normal margins. 

With symmetric intervals, UMDA and GCEDA with normal margins behave better than 
with normal kernel margins. However, if the initial population is sampled asymmetrically, 
this situation changes, which is more remarkable in GCEDA (even GCEDAg might not 
converge) . This situation is illustrated in the optimization of the Griewank function with 
GCEDAg and GCEDAe. Figure |4] shows both the normal and normal kernel densities of 
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Normal margin 


• 




Gaussian kernel margin 


• 


- r ■ 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Generation 



Figure 2: Box-plots illustrating the evolution of the first variable of Griewank in the selected 
population of UMDAg (top) and UMDAg (bottom) for 15 generations. 



the first variable, which are estimated at generations 10, 15, 20, 25 and 30. We recall 
that the zero value corresponds to the value of the variable in the global optimum. In 
Figure |4]-(top), note that with normal margins the zero is located at the tail of the normal 
density, thus, it is sampled with low probability. As the evolution proceeds, the density 
moves away from zero. In Figure [^(bottom), the normal kernel margins capture more 
local features of the distribution and it is more likely that good points are sampled. 

3. In problems where UMDA exhibits good performance, the introduction of correlations by 
GCEDA seems to be harmful. 

Sphere, Griewank and Ackley can be easily optimized by UMDA as far as the marginal 
information is enough for finding the global optimum. GCEDA requires to compute 
many parameters and larger populations are needed to estimate them reliably. Figure [5] 
illustrates this issue in the Sphere function. We run UMDAg with its critical population. 
For GCEDAg we use different population sizes, including the critical population of these 
two algorithms (86 and 325, respectively). The box-plot shows that GCEDAg achieves 
the means and variances of UMDAg but uses larger populations. 

4. UMDA is not capable of optimizing Summation Cancellation. 

Summation Cancellation has multivariate linear interactions between the variables . As 
far as this information is essential for finding the global optimum, UMDA fails to optimize 
this function with both normal and kernel margins, while GCEDA is successful, though 
this algorithm is also sensitive to the effect of asymmetry. 

Summarizing, we can say that both aspects: the statistical properties of the marginal distribu- 
tions and the dependence structure play a crucial role for the success of EDA optimization. In 
the following sections we deal with the latter aspect. 
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Gaussian kernel margin 


Normal margin 


7 -\ 













-600 600 -600 



Figure 3: Histograms of the first variable of the Griewank function in the selected population 
of the second generation with UMDAe (left) and UMDAg (right). The empirical and normal 
densities are superposed, respectively. 



Table 5: Results of VEDA in Sphere with X, e [-600, 600], i = l,...,W. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 




CVEDAg^ greedy, g 
DVEDAg, greedy, g 


30/30 
30/30 


188 
207 


8, 033.8 ± 170.5 
8, 818.2 ± 192.9 


6.8E-07±2.1E- 
7.0E-07±1.8E- 


07 
07 



4.3 Effect of the Dependence Structure 

This section reports the most important results of our work. We investigate the effect of 
combining different copulas, applying the truncation strategy, and selecting the structure of 
C- vines and D- vines in the performance of VEDA. 

4.3.1 Combining Different Bivariate Copulas 

In this section we assess the effect of using different types of dependences when all the marginal 
distributions are normal. The experimental results obtained with CVEDA and DVEDA in 
Sphere, Griewank, Ackley and Summation Cancellation are presented in Tables [5}|8] respec- 
tively. The studied algorithms are CVEDAg, greedy, g and DVEDAg^ greedy, g- The sub-indexes 
mean that they perform a complete construction of the vines (9 trees) , use greedy heuristics to 
represent the stronger dependences in the first tree, and all margins are normal. 
In the investigated problems the following hold: 

1. CVEDA and DVEDA exhibit a good performance in problems with both strong and weak 
dependences between the variables. 

While UMDA uses the independence model and GCEDA assumes a linear dependence 
structure, CVEDA and DVEDA do not assume the same type of dependence across all 
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Figure 4: Marginal distributions of the first variable of Griewank with GCEDAg (top) and 
GCEDAe (bottom) in the generations 10, 15, 20, 25 and 30. 



Table 6: Results of VEDA in Griewank with £ [-600,600], i = 1, 



10. 



Algorithm Success Population Evaluations Best Evaluation 

CVEDAg, greedy, g 30/30 213 9, 151.9 ±452.6 6.5E - 07 ± 1.8E - 07 

DVEDAg, greedy, g 30/30 225 9, 630.0 ± 309.2 6.9E - 07 ± 1.5E - 07 



pairs of variables. The estimation procedures used by the vine-based algorithms select 
among a group of candidate bivariate copulas, the one that fits the data appropriately. 
CVEDA and DVEDA perform, in general, between UMDA and GCEDA in terms of the 
number of function evaluations. 

2. CVEDA exhibits better results than DVEDA in easy problems for UMDA (Sphere, 
Griewank and Ackley). 

The model used by DVEDA allows a more freely selection of the bivariate dependences 
that will be explicitly modeled, while the model used by CVEDA has a more restrictive 
structure. These characteristics enable DVEDA to fit in the first tree a greater number 
of bivariate copulas that represent dependences. This may explain why DVEDA requires 
larger sample sizes than CVEDA, and thus more function evaluations. 

3. CVEDA has much better results than DVEDA in Summation Cancellation. 

Summation Cancellation reaches its global optimum when the sum in the denominator 
of the fraction is zero. The i-th term of this sum is the sum of the first i variables of the 
function. Thus, the first variables have a greater influence in the value of the sum. The 
selected populations reflect these characteristics including stronger associations between 
the first variables and the next ones. A C-vine structure provides a more appropriate 
modeling of this situation than a D-vine structure, since it is possible to find a variable 
that governs the interactions in the sample. However, as it was pointed out before, here 
the interesting issue is the success of GCEDA. The explanation is simple. On one hand, 
Summation Cancellation has multivariate linear interactions between the variables (91. 
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Figure 5: Mean and variance of each variable in the selected population at 10* generation with 
GCEDAg and UMDAg in Sphere. GCEDAg requires larger populations than UMDAg. 



Table 7: Results of VEDA in Ackley with e [-30, 30], i = 1, . . . , 10. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 




CVEDA9, greedy, g 
DVEDAg, greedy, g 


30/30 
30/30 


213 
213 


11, 984.8 ± 184.9 
11, 920.9 ± 197.6 


7.9E-07± 1.5E- 
7.9E-07±1.3E- 


07 
07 



On the other hand, the multivariate normal distribution is indeed, a linear model of 
interactions. 

4. Combining normal and non-normal copulas worsens the results of the vine-based algo- 
rithms in Summation Cancellation. 

Since the multivariate linear interactions of Summation Cancellation are readily modeled 
with a multivariate normal dependence structure, GCEDA has better performance than 
vine-based ED As, which can fit copulas of different families (Tables |4] and [s]) . We re- 
peated the experiments using only product and normal copulas. The results show similar 
performance of CVEDAn, 9, greedy, g, DVEDAn, 9, greedy, g and GCEDA, being CVEDA 
sHghtly better than DVEDA. 

Regarding the results presented in this section, we can summarize that EDAs using pair-copula 
constructions exhibit a more robust behavior than EDAs using multivariate product or normal 
copula in the given set of benchmark functions. 
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Table 8: Results of VEDA in Summation Cancellation with X, e [-0, 16, 0, 16], i = 1, . . . , 10. 



Algorithm 




Success 


Population 


Evaluations 


Best Evaluation 




CVEDAg. 


greedy, g 


30/30 


625 


84, 958.3 ± 786.0 


1.0E + 05±1.1E- 


07 


CVEDAn, 


9, greedy, g 


30/30 


319 


43, 373.3 ± 539.5 


1.0E±05±1.3E- 


07 


DVEDAg, 


greedy, g 


30/30 


1400 


161, 840.0 ± 1,352.5 


1.0E±05±9.3E- 


08 


DVEDAn, 


9, greedy, g 


30/30 


488 


58, 494.9 ±457.3 


1.0E±05±1.3E- 


07 



Table 9: Results of VEDA with truncation in Sphere with e [-600, 600], i^l,...,10. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 




CVEDA3, greedy, g 


30/30 


175 


7, 536.6 ± 151.9 


6.5E- 


07±2.2E- 


07 


CVEDAe, greedy, g 


30/30 


191 


8, 174.8 ±176.6 


6.7E- 


07±1.9E- 


07 


CVEDAaiC, greedy, g 


30/30 


163 


7, 106.8 ±139.3 


6.6E- 


07±2.0E- 


07 


CVEDAbIC, greedy, g 


30/30 


113 


5, 017.2 ± 134.6 


6.8E- 


07± 1.6E - 


07 


DVEDA3, greedy, g 


30/30 


191 


8, 149.3 ±161.2 


6.5E- 


07± 1.8E - 


07 


DVEDAs, greedy, g 


30/30 


207 


8, 818.2 ± 128.6 


6.9E- 


07±1.8E- 


07 


DVEDAaIC, greedy, g 


30/30 


163 


6, 992.7 ± 144.2 


6.5E- 


07±1.9E- 


07 


DVEDAbIC, greedy, g 


30/30 


138 


6, 026.0 ± 127.2 


7.0E- 


07±2.2E- 


07 



4.3.2 Truncation of C-vines and D- vines 

In order to reduce the number of levels of the pair-copula decompositions, and hence simplify the 
constructions, we apply two different approaches: the truncation level is given as a parameter 



or it is determined by a model selection procedure based on AIC or BIC (see Section 3.3.11. 
We study the effect of both strategies in the Sphere and Summation Cancellation functions, as 
examples of problems with week and strong correlated variables. The following algorithms are 
compared: 

• CVEDAs^ greedy, g and DVEDAa^ greedy, g truncatc the vines at the third tree. 

• CVEDAg, greedy, g and DVEDAe. greedy, g truncatc the vines at the sixth tree. 

• CVEDAaic, greedy, g and DVEDAaic, greedy, g determine the required number of trees us- 
ing AIC. 



CVEDAbic, greedy, g and DVEDAbic, greedy, g determine the required number of trees us- 
ing BIC. 



The results of the experiments in Sphere and Summation Cancellation are presented in Tables 
[9] and [To] respectively. The main results are summarized in the following points: 

1. The algorithms that use the truncation strategy based on AIC or BIC exhibit a more 
robust behavior. 

The necessary number of trees depends on the characteristics of the function being op- 
timized. In the Sphere function, a small number of trees is quite enough, while in Sum- 
mation Cancellation it is preferable to expand the pair-copula decomposition completely. 
In both functions the better results are obtained when the truncation level is determined 
by a model selection procedure based on AIC or BIC, since cutting the model arbitrarily 
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Table 10: Results of VEDA with truncation in Summation Cancellation with Xi e 
[-0,16, 0,16], i = 1,...,10. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 




CVEDAs^ greedy, g 


0/30 


2000 


500, 000.0 ±0.0 


2.6E ± 03 ± 3.4E + 


03 


CVEDAe, greedy, g 


0/30 


2000 


500, 000.0 ± 0.0 


3.7E±04±3.2E + 


04 


CVEDAaIC, greedy, g 


30/30 


650 


90, 003.3 ± 1,262.8 


1.0E±05± 1.2E- 


07 


CVEDAbIC, greedy, g 


30/30 


800 


108, 506.6 ± 1,647.3 


1.0E±05±9.8E- 


08 


DVEDAa^ greedy, g 


0/30 


2000 


500, 000.0 ± 0.0 


8.4E±04±2.5E+ 


04 


DVEDAg, greedy, g 


10/30 


2000 


412, 133.3 ±12, 8711.1 


9.9E±04± 1.7E + 02 


DVEDAaIC, greedy, g 


30/30 


1300 


152, 750.0 ± 1,404.1 


1.0E±05± 1.0E- 


07 


DVEDAbIC, greedy, g 


26/30 


2000 


285, 000.0 ± 100,221.0 


9.9E±04±6.9E- 


03 



could cause that important dependences are not represented. The latter was the strategy 
applied in |40| , where a D-vine with normal copulas was only expanded up to the second 
tree. A combination of both strategies could be an appropriate solution. 

2. For VEDA the truncation method based on AIC is preferable than the truncation based 
on BIG. 

In the Sphere function, the vine-based ED As that use truncation based on BIG perform 
better than those based on AIG. The opposite occurs in Summation Gancellation, where 
DVEDAbic, greedy, g fail in the 30 runs. Both situations are caused by the term that 
penalizes the number of parameters in these metrics. BIG prefers models with less number 



of copulas than AIG 10 , which is good for Sphere, but compromises the convergence 
of the algorithms in Summation Gancellation. The algorithms using AIG have a good 
performance in both functions. Specifically, in Sphere the number of trees was never 
greater than three with GVEDA and four with DVEDA; in Summation Gancellation 
both algorithms perform complete construction of the vines (nine trees) . 

In the following section, we study the importance of the selection of the bivariate dependences 
explicitly modeled in the first tree of G- vines and D- vines. 



4.3.3 Selection of the Structure of C-vines and D-vines 

The aim of this section is to assess the importance of selecting an appropriate ordering of the 
variables in the pair-copula decomposition for the optimization with vine-based EDAs. 

Here we repeat the experiments with Sphere and Summation Gancellation, but this time the 
variables in the first tree in the decomposition are ordered randomly instead of representing the 
strongest bivariate dependences. The instances of the algorithms selected in these experiments 
are those that showed the best performance in the truncation experiments of the previous 
section. The results are presented in Tables [TT| and [T2| 

In the Sphere function, the algorithms that use a random structure exhibit a better perfor- 
mance, since the number of product copulas that are fitted is greater. In this case, the estimated 
model resembles independence model used by UMDA, which indeed exhibits the best perfor- 
mance with the Sphere function. The opposite occurs with Summation Gancellation, where the 
use of a random structure in the first tree causes that important correlations for an efficient 
search are not represented, which deteriorates the performance of the algorithms in terms of 
the number of function evaluations. The main conclusion of this part is that it is necessary to 
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Table 11: Results of VEDA with a random selection of the structure of the first tree of the 
vines at each generation in Sphere with Xi G [—600, 600], i = 1, ... ,10. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 




CVEDAbic, random, g 
DVEDAbic, random, g 


30/30 
30/30 


100 
100 


4, 523.3 ± 100.6 
4, 526.6 ± 114.2 


6.9E-07±1.8E- 
6.6E-07±1.6E- 


07 
07 



Table 12: Results of VEDA with a random selection of the structure of the first tree of the 
vines at each generation in Summation Cancellation with Xi G [—0, 16, 0, 16], « = 1, . . . , 10. 



Algorithm 


Success 


Population 


Evaluations 


Best Evaluation 


CVEDAaIC, random, g 


30/30 


775 


110, 360.0 ± 2, 020.9 


1.0E + 05±l.lE-07 


DVEDAaIC, random, g 


30/30 


1500 


255, 900.0 ± 5, 205.7 


1.0E + 05±1.2E-07 



make a careful selection of the structure of the pair-copula decomposition. The representation 
of the strongest dependences is important in order to construct more robust vine-based EDAs. 



5 Conclusions 

This paper introduces a class of EDAs called VEDAs. Two algorithms of this class are pre- 
sented: CVEDA and DVEDA, which model the search distributions using C- vines and D- vines, 
respectively. 

The copula EDAs based on vines are more flexible than those based on the multivariate 
product and normal copulas, because the PCC models can describe a richer variety of depen- 
dence patterns. Our empirical investigation confirms the robustness of CVEDA and DVEDA 
in both strong and weak correlated problems. 

We have found that building the complete structure of the vine is not always necessary. 
However, cutting the model at a tree selected arbitrarily could cause that important depen- 
dences are not represented. A more appropriate global strategy could be to combine setting 
a maximum number of trees with a model selection technique, such as the truncation method 
based on AIC or BIC. We also found that it is important to make a conscious selection of the 
pairwise dependences represented explicitly in the model. 

Our findings show that both the statistical properties of the margins and the dependence 
structure play a crucial role in the success of optimization. The use of copulas and vines in 
EDAs represents a new way to deal with more fiexible search distributions and different sources 
of complexity that arise in optimization. 

As future research we consider to extend the class of VEDAs with regular vines. Our algo- 
rithms have been used in the optimization of test functions, such as the ones proposed in CEC 



2005 benchmark 47 . In general, these functions display independence or linear correlations. 
In the future, we will seek problems with relevant dependences to the vine models studied in 
this work. 
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A Expressions of the h and h Functions of Various Bi- 
variate Copulas 

The pair-copulas used in this work are product, normal, Student's t, Clayton, rotated Clayton, 
Gumbel and rotated Gumbel. This appendix contains the definition of these copulas and the 
h and functions required to use this copulas in pair-copula constructions. 



The Bivariate Product Copula 

An immediate consequence of Sklar's theorem is that two random variables are independent 
if and only if their underlying copula is Ci{u,v) = uv. For this copula hi{x,v) = x and 
hY^{u, v) — u. 



The Bivariate Normal Copula 

The distribution function of the bivariate normal copula is given by 

C^{u,v-p)^<^p{^-\u),^-\v)), 

where $p is the bivariate normal distribution function with correlation parameter p and <i>~^ is 
the inverse of the standard univariate normal distribution function. For this copula the h and 
functions are 

hn {x,v;p) = $ '^^^ j , 

h^' (u, «; p) = $ ($-1 (u) ^1-p^+p («)) . 
The derivation of these formulas are given in [l]. 



The Bivariate Student's t Copula 

The distribution function of the bivariate Student's t copula is given by 

Ct{u,v;p,v) = tp.^(i^^(ii),tj;^(u)), 

where tp ^ is the distribution function of the bivariate Student's t distribution with correlation 
parameter p and v degrees of freedom and is the inverse of the univariate Student's t 
distribution function with v degrees of freedom. For this copula the h and h^^ functions are 



/ 



^ {u, v; p, v) = ty 



ht {x,v;p,v) = ty+i 
/ 

V 



/ {^+{tz\v)Y)(i-p^) 



L + {t^\v)f){i-p^) 



\ 



v + l 



I 



The derivation of these formulas are given in [T]. 
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The Bivariate Clayton Copula 

The distribution function of the bivariate Clayton copula is given by 

Cciu,v;9) = {u-' + v-'> -iy'^\ (8) 

where 6* > is a parameter controlling the dependence. Perfect dependence is obtained when 
6 ^ oo, while 0-^0 implies independence. For this copula the h and functions are 

he {x, v; 9) = v-'-^ {x-' + - ly'-'^' , 

{u,v-e) = ((uv'+Y''^'^'^ + 1 - v-'Y^" . 
The derivation of these formulas are given in fl]. 

The Bivariate Rotated Clayton Copula 

The bivariate Clayton copula, as defined in (jsj), can only capture positive dependence. Following 
the transformation used in jlO], we consider a 90 degrees rotated version of this copula. The 
distribution function of the bivariate rotated Clayton copula is obtained as 

Crc(w, v]9) = u- Cc{u, 1 - ~9), 

where < is a parameter controlling the dependence and Cq denotes the distribution function 
of the bivariate Clayton copula. For this copula the h and h'^^ functions are 

h^cix, v; 9) = hc{x, I - v; -9) 

and 

h~^{u,v;9) = h^'^{u,l ~v;-9), 

where he and h^' denote the expressions of the h and h^^ functions for the bivariate Clayton 
copula. 

The Bivariate Gumbel Copula 

The distribution function of a bivariate Gumbel copula is given by 



Cg(u,i;;6') = exp (^(-log m) + (- log u) 

where > 1 is a parameter controlling the dependence. Perfect dependence is obtained when 
9 ~> oo, while 9 — 1 implies independence. The h function is 



ho {x, v; 9) = Cg{x,v; 9) log v)'' ' (- log x)" + (- log i;)' 

V 



i/e-i 



but ho^ cannot be written in closed form; therefore, we obtain it numerically using Brent's 
method fTT|. The derivation of these formulas are given in [l]. 
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The Bivariate Rotated Gumbel Copula 

The bivariate Gumbel copula can only represent positive dependence. As for the bivariate 
Clayton copula and following the transformation used in ^lOj, we also consider a 90 degrees 
rotated version of the bivariate Gumbel copula. The distribution function of the bivariate 
rotated Gumbel copula is defined as 

Crg(w, v;d) = u- Cg{u, 1-v; -9), 

where 9 < —1 is a parameter controlling the dependence and Cq denotes the distribution 
function of the bivariate Gumbel copula. For this copula the h and functions are 

hncix, v; 9) = hG{x, 1 - w; -9) 

and 

h^^{u,v;9) = /iq^m, 1 -v;-9), 

where ho and h^^ denote the expressions of the h and h^^ functions for the bivariate Gumbel 
copula. 
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